stochastic rule
Towards a Unified Quadrature Framework for Large-Scale Kernel Machines
Liu, Fanghui, Huang, Xiaolin, Chen, Yudong, Suykens, Johan A. K.
In this paper, we develop a quadrature framework for large-scale kernel machines via a numerical multiple integration representation. Leveraging the fact that the integration domain and measure of typical kernels, e.g., Gaussian kernels, arc-cosine kernels, are fully symmetric, we introduce a deterministic fully symmetric interpolatory rule to efficiently compute its quadrature nodes and associated weights to approximate such typical kernels. This interpolatory rule is able to reduce the number of needed nodes while retaining a high approximation accuracy. Further, we randomize the above deterministic rule such that the proposed stochastic version can generate dimension-adaptive feature mappings for kernel approximation. Our stochastic rule has the nice statistical properties of unbiasedness and variance reduction with fast convergence rate. In addition, we elucidate the relationship between our deterministic/stochastic interpolatory rules and current quadrature rules for kernel approximation, including the sparse grids quadrature and stochastic spherical-radial rule, thereby unifying these methods under our framework. Experimental results on several benchmark datasets show that our fully symmetric interpolatory rule compares favorably with other representative random features based methods.
A Construction of Bayesian Networks from Databases Based on an MDL Principle
This paper addresses learning stochastic rules especially on an inter-attribute relation based on a Minimum Description Length (MDL) principle with a finite number of examples, assuming an application to the design of intelligent relational database systems. The stochastic rule in this paper consists of a model giving the structure like the dependencies of a Bayesian Belief Network (BBN) and some stochastic parameters each indicating a conditional probability of an attribute value given the state determined by the other attributes' values in the same record. Especially, we propose the extended version of the algorithm of Chow and Liu in that our learning algorithm selects the model in the range where the dependencies among the attributes are represented by some general plural number of trees.
PAC Classification based on PAC Estimates of Label Class Distributions
Palmer, Nick, Goldberg, Paul W.
A standard approach in pattern classification is to estimate the distributions of the label classes, and then to apply the Bayes classifier to the estimates of the distributions in order to classify unlabeled examples. As one might expect, the better our estimates of the label class distributions, the better the resulting classifier will be. In this paper we make this observation precise by identifying risk bounds of a classifier in terms of the quality of the estimates of the label class distributions. We show how PAC learnability relates to estimates of the distributions that have a PAC guarantee on their $L_1$ distance from the true distribution, and we bound the increase in negative log likelihood risk in terms of PAC bounds on the KL-divergence. We give an inefficient but general-purpose smoothing method for converting an estimated distribution that is good under the $L_1$ metric into a distribution that is good under the KL-divergence.